Medical and Pharmacy Coverage Decision Making at the Population Level

Medicare is one of the largest health care payers in the United States. As a result, its decisions about coverage have profound implications for patient access to care. In this commentary, the authors describe how Medicare used evidence on heterogeneity of treatment effects to make population-based decisions on health care coverage for implantable cardiac defibrillators. This case is discussed in the context of the rapidly expanding availability of comparative effectiveness research. While there is a potential tension between population-based and patient-centered decision making, the expanded diversity of populations and settings included in comparative effectiveness research can provide useful information for making more discerning and informed policy and clinical decisions.

H ealth care decision makers face many types of challenges when developing medical and pharmacy policy, especially in terms of the quality and precision of evidence they have for making decisions about coverage for medical technologies. Comparative effectiveness research (CER) aims to produce the type of evidence that will "assist consumers, clinicians, purchasers, and policy makers to make informed decisions [to] improve health care at both the individual and population levels." 1 While health care decision makers, by necessity, must make population-based decisions, the creation of the Patient Centered Outcomes Research Institute (PCORI) and its mission exemplifies an increasing focus on generating and disseminating improved evidence to inform choices in treatments at the patient level. 2 Despite the natural tension between population-based and individual patient decision making, greater precision about which interventions work best for which individuals and subgroups within populations can only be helpful for decision makers.
Trends in health care spending as well as the current uneven state of evidence temper the environment for decision making. Most economists agree that the major driver of increasing health care spending over time is new medical technology as well as increasing use of existing technology. [3][4][5][6][7][8][9][10] Decision makers are faced with trying to make the best, most judicious decisions for programs and must frequently ask these questions: What is the best treatment in the appropriate population? Where does it work best? Where potentially might there be only marginal benefit or risks that outweigh the expected benefits?
In this commentary, we present a case study from the Medicare program on coverage for implantable cardiac defibrillators in patients at risk of sudden cardiac death. This example presents a clear tension between the programmatic need to make broad population-based coverage decisions with the more individualized approach of making more targeted decisions about specific subgroups. As CER extends studies into more diverse populations and settings that are of interest to health care decision makers, it also brings the increasingly complex analytical challenge of how to interpret treatment heterogeneity.

■■ Health-Spending Trends
Although the rate of growth of health care spending has slowed in recent years, the United States continues to spend a substantially higher proportion of its gross domestic product (GDP) on health care than other developed countries. 11 According to data recently published from the Organization for Economic Cooperation and Development, the United States spends roughly 18% of its GDP on health care, which is about 30% above what would be predicted given its relative wealth ( Figure 1). This relatively high rate of spending is occurring with little evidence that major health indicators are enhanced, implying that resources are being used inefficiently. Excessive spending on health care reduces global competitiveness and diverts resources that might be directed to other desirable societal needs.
Health technology is considered by many experts to be a primary driving force behind the rate of growth of health spending, with estimates varying from 38% to over 60%, depending on the analytic methods used and how health care technology is defined. [3][4][5][6][7][8][9][10] Prior to the passage of the Affordable Care Act, substantial attention was focused on the potential for CER to help control health care costs. Reflecting this view, Peter Orzag, former Director of the Office of Management and Budget, argued in congressional testimony that "better information about the costs and benefits of different treatment options, combined with new incentive structures reflecting the information . . . is essential to putting the country on a sounder long-term fiscal path." 12 This framing highlighted the view that health spending represents the primary challenge for long-term fiscal viability-an urgent economic challenge and not simply a health policy problem. The proposed mechanism for this effect is that CER would significantly reduce gaps in a power drill can find detailed, comparative information about various dimensions of performance (e.g., speed, power, noise) for different brands and models in product review magazines, such as Consumer Reports. The desired performance features are identified through consumer surveys and focus groups, and the performance information for the important features are then generated through extensive product testing. Equipped with this information, consumers can make informed purchases that reflect their personal preferences about how a product performs relative to the cost and performance attributes of alternatives. Unfortunately for health care decision makers, information about the comparative performance of health interventions in areas that matter most to patients and clinicians is often unavailable. evidence about the effectiveness of alternative health care services, resulting in reduced variation in use of effective and ineffective health care services, thereby achieving better health outcomes at lower overall spending levels. Ensuring that health technologies are used judiciously in the most appropriate population is a major concern to health care decision makers and, more importantly, may be critical to bending the health spending trends that threaten long-term fiscal viability.

■■ Current State of Evidence
Currently, many medical decisions are made with limited and/or poor evidence-the necessary studies in many cases have not been done. This is in sharp contrast to the type of information that is readily available to consumers for many other goods and services. For example, a person interested in purchasing  • USA

Health Spending Per Capita (USD PPP)
• AUT To illustrate the current lack of high-quality evidence available for making informed treatment choices in health care, one can read any number of systematic reviews that appraise and synthesize the evidence about clinical effectiveness for specific interventions. As an example, the evidence report produced by the Agency for Healthcare Research and Quality comparing various forms of radiation therapy for treatment of clinically localized prostate cancer identified thousands of clinical articles about the therapies' effects on disease-specific survival and adverse events. 13 However, the conclusion reached after detailed review of all available studies was generally insufficient for guiding clinical decisions or policy making. A broad range of problems with the design, conduct, and reporting of these studies accounted for the discrepancy between the volume of data available and the ability to draw confident conclusions about which treatments should be preferred.
As another illustration, researchers from Duke University found that most clinical practice guidelines in cardiology are based on low or moderate quality evidence, as shown in Figure 2. This chart, produced from data published in the Journal of the American Medical Association in 2009, 14 illustrates the quality of evidence used in developing clinical practice guidelines for various types of cardiac disease. Three levels of evidence (A = randomized controlled trial [RCT] or meta-analysis of the clinical literature; B = single randomized trial or nonrandomized studies; and C = consensus opinion, case studies, or standards of care) are shown for each cardiac condition (e.g., stable angina) with a practice guideline. Notably, over half of the recommendations for these common cardiac conditions are based on level C evidence; for valvular heart disease, more than two-thirds of the recommendations are based on this relatively low level of evidence.
Paradoxically, the lack of high-quality evidence is not due to a limited volume of clinical research. Over 19,000 RCTs are now published every year, 15 and tens of thousands of other clinical studies are reported in the literature as well. The National Institutes of Health (NIH) spends over $30 billion on clinical research each year, 16 and the pharmaceutical industry spends even more. 17 Despite this investment, systematic reviews intended to inform clinical and health policy decisions routinely conclude that the evidence is inadequate and insufficient for developing policy, as illustrated in the previous examples. Clearly, the research that is being conducted and published suffers from a range of limitations that are often catalogued in systematic reviews and is not considered adequate from the perspective of decision makers. CER holds the promise of improving the quality and volume of relevant and credible evidence that is available for decision makers.

Level of Evidence
Some of the shortcomings of existing clinical studies are that they include highly selected research subjects, are conducted in research settings that are not typical of settings in which care is usually delivered, are using inappropriate comparators or are missing comparators altogether, and measure physiologic or surrogate outcomes rather than outcomes that matter to patients and other decision makers, such as functional status or long-term impact. 18 CER aims to fill these gaps by providing answers about how technologies work under usual care circumstances in a broad population. 19 PCORI was created with the goal of improving health care delivery and outcomes-by producing and promoting high integrity, evidence-based information-that comes from research guided by patients, caregivers, and the broader health care community. 20 PCORI will enhance knowledge by supporting new research and the analysis and synthesis of existing research. The statutory language defining PCORI authorizes research that supports a strong "patient-centered" orientation.
The working definition of patient-centered outcomes research (PCOR) again highlights the conflict between making decisions about what is best for the individual versus what is best for a population. Although closely related to CER, PCOR answers the following patient-focused questions 2  Health care decision makers need to make population-based decisions, but as improved granularity of evidence is made available through PCOR, they will want and use this type of evidence to formulate more targeted policies.

■■ Case Study: Medicare Decisions About Paying for Implantable Cardiac Defibrillators
Medicare's decision making about implantable cardiac defibrillators (ICDs) illustrates both the potential benefit and downside of policymakers using information on heterogeneity of treatment effects for formulating coverage decisions. To provide some context, Medicare coverage policy is guided by the following statutory language: "Notwithstanding any other provisions of law . . . no payment may be made . . . for items or services . . . [which] are not reasonable and necessary for the diagnosis or treatment of illness or injury." 21 The debate over what is meant by "reasonable and necessary" has existed for several decades. Those terms have never been defined in law or in regulation, and there is uncertainty about what they mean. The Centers for Medicare & Medicaid Services (CMS) uses a working definition with 2 benchmarks: (1) whether there is adequate evidence to conclude that the technology improves net health outcomes that matter to patients and (2) whether the findings are generalizable to the Medicare population.
Medicare has been covering ICDs for persons who have had a history of ventricular arrhythmias since the 1980s. However, in 2001, results from the Multicenter Automatic Defibrillator Implantation Trial (MADIT-II) became available that would have greatly expanded the eligible population to those at risk for sudden cardiac death, but no history of arrhythmia, and would have carried substantial budget implications for the Medicare program. 22 The rationale for MADIT-II was that fatal arrhythmias could occur without warning in patients who had no history of ventricular arrhythmia after they experienced a recent myocardial infarction with lasting damage to the heart muscle. MADIT-II enrolled patients with a prior myocardial infarction and poor heart function (an ejection fraction of less than 30%), irrespective of their history of ventricular dysfunction. In the MADIT-II study, patients were randomized to either prophylactic use of an ICD with conventional drug treatment or conventional drug treatment alone. The reduction in mortality provided by ICDs in this population was so dramatic (31% reduction in all-cause mortality) that the study was terminated early.
By amending Medicare coverage to include all persons eligible for MADIT-II, the program would have expanded the potential use of ICDs by approximately 2-3 million prevalent patients and covered 40,000 new implantations a year. Ultimately, this single expansion in coverage would have increased Medicare spending by an estimated $2-$3 billion, if all eligible patients received a device. Apart from a concern about the implications on public spending, there were other studies, as well as subsequent analyses of MADIT-II, that suggested that within this broad group of potentially eligible patients there may be some subpopulations that would benefit markedly from ICD implantation, while others might derive only modest, if any, improvement.
Medicare officials were interested in the personalized medicine approach of trying to identify the population of patients likely to experience the most benefit, while sparing patients unlikely to benefit from the discomfort and risks of device implantation. The agency requested the raw data from MADIT-II investigators in order to further examine possible subpopulation effects. When the study sample was segmented by those who had wide QRS interval on their EKG (a known predictor of more frequent arrhythmias) versus those with normal EKG findings, those with a wide QRS interval demonstrated a dramatic reduction in mortality, while the cohort with a normal QRS interval did not appear to gain any survival benefits ( Figure 3). The chief concern with this finding was that the analysis was based on a retrospective subgroup analysis of the clinical trial data and, while biologically plausible, may have been due to statistical accident.
These findings left the agency with a dilemma. Should they expand coverage to all postmyocardial infarction patients who have an ejection fraction of less than 30% and, thus, provide access to all those who could potentially benefit, while also potentially exposing a large number of patients to procedures they did not need? Alternatively, should they only cover people with a low ejection fraction and a wide QRS interval, thus, targeting coverage to the group who could benefit most, and help reduce unnecessary risks related to surgery and excessive harms in people who likely would not benefit? After much deliberation, the agency decided to cover only people with a low ejection fraction and a wide QRS duration, while noting that the decision would be revisited when the results of a very large NIH clinical trial were available in approximately 6 months. The response to this determination from the provider community was widespread and vitriolic, even extending to Wall Street investors who debated about the appropriate use of post hoc analyses of subgroups. 23 What is interesting about this case is that this decision to restrict coverage to a specific subpopulation really amplified the dialogue about the appropriate use of evidence-based medicine in coverage decision making to a much greater extent than had been previously discussed in the Medicare program.
As mentioned in Medicare's original policy, results from another trial, the Sudden Cardiac Death-Heart Failure Trial (SCD-HeFT), were published within about 6 months. 24 This study examined the benefits of ICD implantation in a slightly different patient population-those with congestive heart failure due to coronary disease as well as other causes. SCD-HeFT randomized patients to placebo, conventional drug therapy, or ICD. Again, there was a finding that ICD implantation could markedly improve overall survival. Investigators also performed a subgroup analysis comparing patients with a wide QRS interval versus those with a narrower interval. There were no significant survival differences between these 2 groups, although a trend toward a smaller effect was still shown for the subgroup with a normal QRS. Based on the aggregate evidence now available, Medicare expanded coverage to the patient population with an ejection fraction under 30%, irrespective of the QRS interval and history of ventricular arrhythmia. 25 Why might there be a potential difference between the findings about the QRS subgroups between these 2 studies? While the enrolled population was somewhat different in each of the studies, no one knows the answer to this question for sure, other than the well-known uncertainties associated with performing subgroup analyses when not prespecified. Both of the trials were large (N = 1,232 and N = 2,521, respectively); they were rigorously designed; and the end points were important (i.e., survival), all of which meet many of the expectations for good CER. Both studies highlight the difficulties in predicting which patients would be the best candidates for ICDs. Multiple planned and post hoc subgroup analyses were conducted to identify characteristics or a constellation of clinical characteristics that would best identify the population that would benefit most from an ICD. The desire to better understand which patients are the most appropriate candidates for an ICD was one of the primary considerations that motivated Medicare to implement a requirement that all patients receiving an ICD be enrolled in a national clinical registry. 26 Analytical approaches to HTE can be divided into a few distinct approaches. 31 The most robust approach is a confirmatory analysis, which is hypothesis driven with prespecified subgroups and a predefined statistical plan. The least robust approach is when an exploratory analysis is conducted after the data are collected (post hoc). Exploratory analyses can be useful for refining hypotheses to improve the design of future trials that would provide more informative confirmatory subgroup analyses. In general, the U.S. Food and Drug Administration and health care decision makers are skeptical of the results of post hoc analyses, 34 but both can be informative as part of a broader package of evidence and when interpreted within a specific context. The exploratory analyses done in the ICD case study were post hoc, and although the initial decision based on these results was controversial, the findings later stimulated very fruitful additional studies into what was driving the differences in the treatment outcomes that were identified in the stratified analysis of the trial data.

Survival Probability for MADIT-II Study Population by Width of QRS Interval
As recommended by Kent and Hayward, every study protocol should include a plan to address and analyze HTE. Optimally, this would include predefined, adequately powered subgroup analysis and risk stratification using a validated multivariate risk prediction tool. 32 Recognizing that industry may not wish to narrow the eligible treatment population and, in some circumstances, has a disincentive not to report subgroup analyses, Kent and Hayward call for decision makers to clarify their expectations that such analyses be conducted and reported more consistently.

■■ Conclusion
CER expands the diversity of populations and settings that are included in studies. Heterogeneity in CER can provide very useful information for making informed policy and clinical decisions. To take advantage of this feature, CER needs to be designed to better explore the sources of HTE. If CER is to deliver on its promise of helping us move toward more patientcentered treatment and policy decisions, studies need to be designed to deliver, with greater precision, results that increase our understanding of what works best for whom. Decision makers can then use this information to develop more tailored policies, rather than basing decisions on broad population averages. These more tailored policies will help ensure that medical technologies are used by those who would benefit and discourage their use in populations who would be harmed.
The strong survival benefit found in these trials has been replicated through analyses of patients enrolled in this registry. This is surprising, since ICDs are being implanted in older, more complex patients under conditions of usual care. 27 Despite the flurry of research to better define appropriate ICD candidates, we do not yet understand which patient characteristics can predict who would most benefit. However, there are several plausible hypotheses, including patients without raised levels of B-type natriuretic peptides or left bundle branch block, younger candidates, or a combination of these risk factors. 28,29 Answering the question of who would most benefit is, nevertheless, of vital importance, not only because of the cost to the program, but because of the desire to reduce the harms of unnecessary surgery. Up to 80% of patients who meet the criteria for implantation will never have a cardiac event requiring an ICD, yet the harms of unnecessary surgery and accidental firing of ICDs are not inconsequential. 30 ■■ Techniques to Deal with Heterogeneity of Treatments Effects As we transition from explanatory trials into CER (efficacy vs. real-world effectiveness), understanding the impact of heterogeneity of treatment effects (HTE) on the findings becomes more imperative. The pharmaceutical industry has moved away from focusing on narrow, homogeneous populations and toward including a wide variety of people of interest to payers, such as those with comorbidities, the elderly, and minority populations. As a decision maker, what needs to be asked from these studies? Treatment heterogeneity is defined as a nonrandom variation in a particular subpopulation either in terms of the magnitude or of the direction of the effect of the study intervention on outcomes. 31 The idea is that while some people may be helped, others may be harmed by an intervention, and for others there is no discernible effect. Kent and Hayward (2007) described how most trials and studies do not have a normal distribution in treatment effect; in fact, it is likely there is a highly skewed distribution with the risk for adverse outcomes being concentrated in a very small group. 32 As examples, the authors noted previous research that showed a 10-fold difference in the risk for mortality across subpopulations after myocardial infarction and a 70-fold difference in the risk for progression from chronic kidney disease to end-stage renal disease by cohort depending upon clinical characteristics.
There are a variety of sources of heterogeneity in response to treatment. Outcomes vary based on how providers are delivering care, how patients adhere to treatment, and the physical or psycho-social environment in which care is delivered. 33 While